22 research outputs found

    Active Nearest-Neighbor Learning in Metric Spaces

    Full text link
    We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. MARMANN is based on a generalized sample compression scheme, and a new label-efficient active model-selection procedure

    Efficient Learning of Linear Separators under Bounded Noise

    Full text link
    We study the learnability of linear separators in d\Re^d in the presence of bounded (a.k.a Massart) noise. This is a realistic generalization of the random classification noise model, where the adversary can flip each example xx with probability η(x)η\eta(x) \leq \eta. We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the uniform distribution over the unit ball in d\Re^d, for some constant value of η\eta. While widely studied in the statistical learning theory community in the context of getting faster convergence rates, computationally efficient algorithms in this model had remained elusive. Our work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. We additionally provide lower bounds showing that popular algorithms such as hinge loss minimization and averaging cannot lead to arbitrarily small excess error under Massart noise, even under the uniform distribution. Our work instead, makes use of a margin based technique developed in the context of active learning. As a result, our algorithm is also an active learning algorithm with label complexity that is only a logarithmic the desired excess error ϵ\epsilon

    Advances in Neural Information Processing Systems

    Get PDF
    Better understanding of the potential benefits of information transfer and representation learning is an important step towards the goal of building intelligent systems that are able to persist in the world and learn over time. In this work, we consider a setting where the learner encounters a stream of tasks but is able to retain only limited information from each encountered task, such as a learned predictor. In contrast to most previous works analyzing this scenario, we do not make any distributional assumptions on the task generating process. Instead, we formulate a complexity measure that captures the diversity of the observed tasks. We provide a lifelong learning algorithm with error guarantees for every observed task (rather than on average). We show sample complexity reductions in comparison to solving every task in isolation in terms of our task complexity measure. Further, our algorithmic framework can naturally be viewed as learning a representation from encountered tasks with a neural network

    Adversarially Robust Learning with Tolerance

    Full text link
    We initiate the study of tolerant adversarial PAC-learning with respect to metric perturbation sets. In adversarial PAC-learning, an adversary is allowed to replace a test point xx with an arbitrary point in a closed ball of radius rr centered at xx. In the tolerant version, the error of the learner is compared with the best achievable error with respect to a slightly larger perturbation radius (1+γ)r(1+\gamma)r. This simple tweak helps us bridge the gap between theory and practice and obtain the first PAC-type guarantees for algorithmic techniques that are popular in practice. Our first result concerns the widely-used ``perturb-and-smooth'' approach for adversarial learning. For perturbation sets with doubling dimension dd, we show that a variant of these approaches PAC-learns any hypothesis class H\mathcal{H} with VC-dimension vv in the γ\gamma-tolerant adversarial setting with O(v(1+1/γ)O(d)ε)O\left(\frac{v(1+1/\gamma)^{O(d)}}{\varepsilon}\right) samples. This is in contrast to the traditional (non-tolerant) setting in which, as we show, the perturb-and-smooth approach can provably fail. Our second result shows that one can PAC-learn the same class using O~(d.vlog(1+1/γ)ε2)\widetilde{O}\left(\frac{d.v\log(1+1/\gamma)}{\varepsilon^2}\right) samples even in the agnostic setting. This result is based on a novel compression-based algorithm, and achieves a linear dependence on the doubling dimension as well as the VC-dimension. This is in contrast to the non-tolerant setting where there is no known sample complexity upper bound that depend polynomially on the VC-dimension.Comment: The paper was accepted for ALT 202

    Learning with non-Standard Supervision

    Get PDF
    Machine learning has enjoyed astounding practical success in a wide range of applications in recent years-practical success that often hurries ahead of our theoretical understanding. The standard framework for machine learning theory assumes full supervision, that is, training data consists of correctly labeled iid examples from the same task that the learned classifier is supposed to be applied to. However, many practical applications successfully make use of the sheer abundance of data that is currently produced. Such data may not be labeled or may be collected from various sources. The focus of this thesis is to provide theoretical analysis of machine learning regimes where the learner is given such (possibly large amounts) of non-perfect training data. In particular, we investigate the benefits and limitations of learning with unlabeled data in semi-supervised learning and active learning as well as benefits and limitations of learning from data that has been generated by a task that is different from the target task (domain adaptation learning). For all three settings, we propose Probabilistic Lipschitzness to model the relatedness between the labels and the underlying domain space, and we discuss our suggested notion by comparing it to other common data assumptions

    Generative Multiple-Instance Learning Models For Quantitative Electromyography

    Full text link
    We present a comprehensive study of the use of generative modeling approaches for Multiple-Instance Learning (MIL) problems. In MIL a learner receives training instances grouped together into bags with labels for the bags only (which might not be correct for the comprised instances). Our work was motivated by the task of facilitating the diagnosis of neuromuscular disorders using sets of motor unit potential trains (MUPTs) detected within a muscle which can be cast as a MIL problem. Our approach leads to a state-of-the-art solution to the problem of muscle classification. By introducing and analyzing generative models for MIL in a general framework and examining a variety of model structures and components, our work also serves as a methodological guide to modelling MIL tasks. We evaluate our proposed methods both on MUPT datasets and on the MUSK1 dataset, one of the most widely used benchmarks for MIL.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    When can unlabeled data improve the learning rate?

    Get PDF
    Göpfert C, Ben-David S, Bousquet O, Gelly S, Tolstikhin I, Urner R. When can unlabeled data improve the learning rate? In: Conference on Learning Theory (COLT). 2019
    corecore